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(S) Method and apparatus for processing documents. 

(57) In a method and apparatus for processing 
documents (30), a digital image of each docu- 
ment (30) is captured and stored in a memory 
(70). A feature model is formed from the stored 
image by extracting graphical features such as 
horizontal lines and/or boxes from the stored 
image, and this feature model is compared with 
stored feature models from an identification 
feature file (80) to identify the type of the docu- 
ment The identification information is used to 
select, from a document definition file (82), an 
appropriate document description for identify- 
ing document zones to be read, and, from a 
library of image processing utilities (84), an 
appropriate processing utility, such as a data 
recognition utility to recognize the information 
read from such zones. The invention has an 
application to a self-service document proces- 
sing terminal (10) for processing documents 
such as cheques (100) to be deposited or bills 
(150) to be paid. 
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1 EP0 616 

This invention relates to a method of processing 
documents of a plurality of types. 

The invention also relates to apparatus for proc- 
essing documents of a plurality of types. 

The processing by financial institutions, such as 5 
banks, of financial documents such as cheques to be 
deposited, and bills to be paid is a costly and time- 
consuming operation, and can often involve a cus- 
tomer wishing to perform a financial transaction, such 
as depositing a cheque or paying a bill, in a long wait- 10 
ing time to receive the attention of a bank teller to per- 
form the transaction. The financial documents han- 
dled by the bank teller may be of many different types 
with differing information layouts, and contain infor- 
mation which may be handwritten or printed in any 15 
one of a wide variety of typefonts. 

It is an object of the present invention to provide 
a method and apparatus for processing documents 
wherein information contained in various locations on 
a variety of different document types can be automat- 20 
ically extracted and processed. 

Therefore, according to one aspect of the present 
invention, there is provided a method of processing 
documents of a plurality of types, including the steps 
of: forming a digital image of a document to be proc- 25 
essed; and storing the digital image, characterized by 
the steps of: extracting from the stored digital image 
graphical features of said document; identifying the 
document type to which said document belongs on 
the basis of the extracted graphical features; utilizing 30 
the identified document type to select an appropriate 
stored document description; utilizing the selected 
document description to select at least one zone of 
the stored digital image and to select an image proc- 
essing program; and processing the information con- 35 
tained in the selected zone or zones in the stored im- 
age using the selected image processing program. 

According to another aspect of the present inven- 
tion, there is provided apparatus for processing docu- 
ments of a plurality of types, including transport 40 
means adapted to move a document; imaging means 
adapted to provide digital image signals representing 
an image of said document; and image storage 
means adapted to store said image signals, charac- 
terized by processing means coupled to said image 45 
storage means and adapted to extract graphical fea- 
tures from the stored image of said document and to 
compare the extracted graphical features with corre- 
sponding graphical features associated with a plural- 
ity of document feature models arranged in a first file 50 
stored in first memory means, to identify the docu- 
ment type to which the document belongs; second 
memory means for storing a second file of document 
zone descriptions and adapted to provide location in- 
formation for at least one zone on the document con- 55 
tain ing data to be processed and parameters defining 
the properties of data contained within the respective 
zones; third memory means for storing a library of im- 
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age processing programs for performing image proc- 
essing functions; and control means adapted to util- 
ize said parameters to select an appropriate image 
processing program from said library of image proc- 
essing programs for processing the data contained in 
said at least one zone. 

It will be appreciated that a method and appara- 
tus according to the present invention provides a par- 
ticular advantage in a self-service document process- 
ing terminal where there are a large number of docu- 
ment types capable of being processed, since it is im- 
practical for the terminal user to manually enter infor- 
mation identifying the document type or data location 
and properties of such data. The method and appa- 
ratus according to the present invention enable docu- 
ments of a variety of types to be automatically proc- 
essed by such a self-service terminal. 

One embodiment of the present invention will 
now be described by way of example, with reference 
to the accompanying drawings, in which:- 

Fig. 1 is a perspective view of a self-service docu- 
ment processing terminal incorporating the pres- 
ent invention; 

Fig. 2 is a simplified diagram illustrating the pas- 
sage of a document through the terminal shown 
in Fig. 1; 

Fig. 3 is a block diagram showing the intercon- 
nection of components employed in the terminal 
of Fig. 1; 

Fig. 4 shows a cheque capable of being process- 
ed by the terminal of Fig. 1 ; 
Fig. 5 shows a bill capable of being processed by 
the terminal of Fig. 1; 

Fig. 6 is a flowchart illustrating the selection of a 
transaction type by a customer operating the ter- 
minal of Fig. 1; 

Fig. 7 is a flowchart illustrating the processing of 
a cheque by the terminal of Fig. 1; 
Fig. 8 is a flowchart illustrating the processing of 
a bill by the terminal of Fig. 1; 
Fig. 9 is a flowchart illustrating the procedure for 
automatically identifying the type of document 
being processed by the terminal of Fig. 1; and 
Fig. 10 is a flowchart illustrating the processing 
of information extracted from a document being 
processed by the terminal of Fig. 1. 
Referring first to Fig. 1 , there is shown a perspec- 
tive view of a self-service document processing ter- 
minal 10 incorporating the present invention. The ter- 
minal 1 0 is a self-service device adapted for opera- 
tion by a customer for the purpose of paying bills, de- 
positing cheques or printing a statement of the cus- 
tomer's account. It will be appreciated that the termi- 
nal 10 is connected in operation to a data processing 
circuitry (not shown) suitable for electric funds trans- 
fer, whereby the customer's account can be automat- 
ically debited, credited or read out. 

The terminal 10 includes a slot 12 for entry of the 
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customer's banking card, a keyboard 14 for entry of 
information and control functions, a slot 1 6 for receiv- 
ing a bill to be paid or a cheque to be deposited and 
for delivering a statement to a customer, and a display 
screen 18. 

Referring now to Fig. 2, there is shown a diagram 
illustrating in simplified schematic form the move- 
ment of a document 30 through the terminal 10 (Fig. 
1 ). The document 30, after insertion in the document 
entry slot 16 (Fig. 1) is moved by a document trans- 
port system 32 along a feed path 34 in the direction 
of the arrow 36. The document 30 is fed past an image 
lift device 38 which senses the document 30 and pro- 
vides digital signals representative of the sensed 
areas (pixels) in known manner, such digital signals 
being supplied over a line 40 to a processing system 
42. The document 30 is then fed further along the 
feed path to a wait station 44 where the document is 
held while processing takes place in the processing 
system 42. Following such processing the document 
30 may be fed via selected diverting flap 46, 48 or 50 
to a respective sorting pocket 52, 54 or 56. Alterna- 
tively, the document may be returned by the transport 
system 32 to the entry slot 16 (Fig. 1) along the feed 
path 34 in the opposite direction to the arrow 36. 

Referring now to Fig. 3, the processing system 42 
will be briefly described. As shown in Fig. 3, the image 
lift device 38 is connected over the line 40 to an image 
memory 70, which may be a RAM memory. It should 
be understood that the image lift device 38 is adapted 
to provide both a binary image (black or white pixels) 
used for subsequent processing and a grey scale dig- 
ital image used to provide a visual display of the docu- 
ments on the display 18. The image memory 70 is 
connected to a bus 72 to which are connected to a 
CPU (central processing unit) 74, the display 18 (Fig. 
1), the keyboard 14, a printer 76 and an encoder 78. 
Also connected to the bus 72 are respective memory 
means containing an identification feature file 80, a 
document definition file 82, a library of image proc- 
essing programs (referred to herein as utilities) 84 
and control software 86 . The identification feature 
file 80 contains a document description for each type 
of document. Each document description, referred to 
herein as a document feature model, includes a rep- 
resentation of graphical features on the document, 
specified by their locations and measurements on the 
document. Examples of graphical features are hori- 
zontal lines, vertical lines and boxes. Associated with 
each document feature model is a document name, 
identifying the document type. 

The document definition file 82 contains a list of 
document names, each document name being asso- 
ciated with a document description including a list of 
zones on the document, together with parameters de- 
fining the properties of data contained within the re- 
spective zones. The control software 86 interprets 
these parameters to select an appropriate image 



processing utility from the library of image process- 
ing utilities 84. 

The image processing utilities contained in the li- 
brary of image processing utilities 84 are functions 

5 which implement a particular method or technique for 
processing image data and making explicit the infor- 
mation contained within the image. 

The terminal 10 is adapted to process more than 
one class of documents. One class of document is a 

10 cheque. Referring now to Fig. 4, there is shown a typ- 
ical cheque 100 capable of being accepted and proc- 
essed by the terminal 1 0. The cheque 1 00 is a printed 
form containing printed information thereon, and 
graphical features such as horizontal lines 102, 104, 

15 106, 108, 110, vertical lines 112, 114 and sloping lines 
116, 118. The printed information includes a printed 
code line disposed in a code line zone 120. When in- 
serted into the terminal 10, the cheque 100 also con* 
tains handwritten information, including the date writ- 

20 ten in a date zone 122, the payee's name written in a 
payee name zone 124, a handwritten amount, a 
courtesy amount written in figures in a courtesy 
amount zone 126, and a signature, written in a signa- 
ture zone 128. Other zones on the cheque 100 are 

25 blank when the cheque is inserted in the terminal 10 
and may be printed by the printer 76 and/or the encod- 
er 78 during processing of the document in the termi- 
nal 10. These zones include an encoding zone 130 
and an endorsement zone 132. 

30 It will be appreciated that cheques emanating 

from a large number of different banks may be proc- 
essed by the terminal 10. Although such cheques all 
contain identical types of information, such as pay- 
ee's name, date, amount, bank code and signature, 

35 the actual location of this information may be differ- 
ent for the cheques of different banks. 

Another class of document which may be proc- 
essed by the terminal 10 is a bill for payment, such as 
a bank giro credit for payment of funds to a company 

40 such as an electricity supply company, for example. 
Referring to Fig. 5, there is shown a typical bill for pay- 
ment which, in the example illustrated, is a bank giro 
credit form 150. The giro credit form 150 contains 
printed information thereon, and graphical features 

45 including horizontal lines such as the referenced hor- 
izontal lines 151, 152, 153, and 154, vertical lines 
such as the referenced vertical lines 155, 156, and 
1 57 and boxes, such as the box 1 58. Printed informa- 
tion contained on the form 150 includes a customer 

50 reference number contained in a zone 160 the pay- 
ee's account number contained in a zone 162, and a 
code line contained in a zone 164. When inserted into 
the terminal 1 0, the form 150 will also contain a hand- 
written entry of the amount to be paid, contained in a 

55 zone 166. 

A typical manner in which the terminal 10 is op- 
erated by a customer will now be described. Referring 
to Fig. 6, which shows a transaction flowchart 200, 
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operation is commenced by the customer inserting 
his bank card in the slot 12 (Fig. 1) as shown in block 
202. The customer then enters his PIN (personal 
identification number ) in the keyboard 14 (block 
204), to verify authorized use of the bank card. The 
customer may select a cheque deposit transaction 
(block 208), a bill payment transaction (block 210) or 
a statement print transaction (block 212). The state- 
ment print transaction, which causes the terminal 10 
to print out a statement of the customer's account is 
not pertinent to the present invention, and will not be 
discussed further herein. 

Referring now to Fig. 7, there is shown a flow- 
chart 230 illustrating the operation of the terminal 10 
for a typical cheque deposit transaction (block 232). 
The customer inserts the cheque document into the 
document entry slot 16 (Fig. 1) (block 234). The im- 
age lift device 38 (Figs. 2 and 3) then lifts the docu- 
ment image (block 236) and stores the image in the 
image memory 70 (Fig. 3) (block 238). The document 
is then identified (block 240), that is, a procedure is 
carried out which ascertains the particular cheque 
type, dependent on which bankorfinancial institution 
issued the cheque. The identification procedure will 
be explained more fully hereinbelow. The document 
image is then displayed on the display screen 18 
(block 242) to reassure the customer that the trans- 
action is proceeding correctly, and to allow the cus- 
tomer to view the displayed document. This display 
step is optional and may be omitted in some applica- 
tions. Data is then read from the document image 
(block 244), the location of the text which is read, and 
the type of reading utility being specified in the docu- 
ment definition file 82 (Fig. 3) for that document type. 
The customer then enters the cheque amount on the 
keyboard 14, and the cheque amount is verified 
(block 248) by comparing the keyed-in amount with 
the courtesy amount read from zone 126 (Fig. 4) dur- 
ing the read data step of block 244. The cheque 
amount is then magnetically encoded on the cheque 
in zone 1 30 (Fig. 4) (block 250), and the cheque is en- 
dorsed by printing thereon (block 252) in the endorse- 
ment zone 132, thereby invalidating the cheque to 
prevent it being used in a subsequent transaction. As 
shown in block 254 a funds transfer operation is then 
effected, wherein the value of the cheque is transfer- 
red from the payer's account, the account number of 
which was read from the code line zone 120 (Fig. 4), 
to the payee's account, the account number of which 
was read from the bank card inserted in the slot 12 at 
the commencement of terminal operation by the cus- 
tomer. Finally (block 256) the cheque is sent to an ap- 
propriate pocket 52-56 (Fig. 2). 

Referring now to Fig. 8, there is shown a flow- 
chart 260 illustrating the operation of the terminal 10 
for a typical bill payment transaction (block 262). The 
customer inserts the bill document into the document 
entry slot 16 (Fig. 1) (block 264). As in the cheque 



reading operation the document image is lifted (block 
266) and stored (block 268). The document is then 
identified (block 270) as will be explained more fully 
hereinbelow. The document image is displayed (block 

5 272), and data is read from the document (block 274), 
the location of the text being read and the type of 
reading utility being specified in the document defi- 
nition file 82 (Fig. 3). The customer then enters the bill 
amount in the keyboard 14 (block 276), and the bill 

10 amount is verified (block 278) by comparing the 
keyed in amount with the bill amount read from the bill 
during the read data step of block 274. This customer 
keyboard entry step 276 and verification step 278 are 
optional and may be omitted in some applications. 

is The terminal 10 then communicates with the central 
data processor to effect a transfer of the appropriate 
funds (block 280). The bill will then be endorsed by 
printing thereon an appropriate text to indicate that 
the bill has been paid and the date of payment, for ex- 

20 ample (block 282), and the endorsed bill is then re- 
turned to the customer (block 284). 

Referring now to Fig. 9, the procedure for identi- 
fying a document (block 240 in Fig. 7 and block 270 
in Fig. 8) will now be explained. The identification pro- 

25 cedure is effected by an algorithm which proceeds ac- 
cording to the flowchart 300 shown in Fig. 9. Thus, the 
document image stored in the image memory 70 (Fig. 
3) is scanned (block 302), and graphical features 
thereof are extracted (block 304). In the preferred em- 

30 bodiment the extracted graphical features are linear 
features, in particular horizontal lines, including the 
lines 1 02-1 1 0 on the cheque 1 00 shown in fig. 4 or the 
lines 151-154 on the bill 150 shown in Fig. 5. How- 
ever, in alternative arrangements, other graphical 

35 features in addition to horizontal lines, such as verti- 
cal lines may be extracted. Furthermore, where hori- 
zontal and vertical lines are extracted, the locations 
thereof in the digital image may be utilized to identify 
further graphical features such as boxes (rectangular 

40 areas) or other simple geometrical configurations. 
The extracted graphical features are then arranged in 
a feature hierarchy specifying their locations in the 
digital image and thereby providing a graphical fea- 
ture description (block 306) constituting a document 

45 feature model. This document feature model is then 
compared (block 308) with document feature models 
(block 310) extracted from the identification feature 
file 80. In a preferred comparison procedure, the 
document feature model derived from the document 

so entered into the slot 1 6 of the terminal 1 0 is compared 
with entries in the identification feature file 80 to 
identify the best match. Optionally, this match may be 
tested by ascertaining whether a known feature, such 
as the existence of a particular printed word or logo 

55 for that type of document, is present If this feature is 
not present, then the next best match may be chosen 
as a candidate, and tested for a known feature, and 
this procedure may be repeated until the document is 
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identified. 

It will be appreciated that the identification fea- 
ture file 80 (Fig. 3) is compiled by extracting graphical 
features, such as horizontai/vertical lines and/or 
boxes, from the digital image of a sample document s 
of each type. Thus, as previously discussed, the iden- 
tification feature file 80 consists of a plurality of docu- 
ment models listing the graphical features being util- 
ized, and their location, together with the appropriate 
document name. 10 

Referring now to Fig. 10, the procedure for read- 
ing data from an identified document (block 244 in 
Fig. 7 and block 274 in Fig. 8) will now be explained. 
The read data procedure is specified by the flowchart 
320. Firstly, using the document name representing is 
the identified document type, the corresponding 
document description is accessed from the document 
definition file 82 (Fig. 3). As discussed hereinabove, 
it is again emphasized that documents of a single 
class (e.g. cheques) can be of many different types. 20 
Thus, there may be many hundreds of different types 
of cheques in circulation. However, as mentioned all 
such cheques will contain the same essential infor- 
mation, such as, for example, the payee's name, the 
amount, bank sort code, account number, date and 25 
signature. The variations between cheque types, for 
example, may be in the position of the data i.e. the lo- 
cation on the document of the zone containing the 
data and the format of the data i.e. (printed, handwrit- 
ten, numeric, OCR etc.). For each document type, the 30 
document definition file 82 (Fig. 3) contains the rele- 
vant data including the location of the zones contain- 
ing the data and parameters defining the properties 
of the data. 

Proceeding with the flowchart 320, the data loca- 35 
tion is retrieved from the document description (block 
324). Next, as shown in block 326, a sub-image con- 
taining the zone or zones which are to be read is ex- 
tracted from the document image stored in the image 
memory 70 (Fig. 3). Also retrieved from the document 40 
definition file 82 (Fig. 3) are parameters defining the 
type of data to be read (block 328), on the basis of 
which the appropriate recognition utility to read the 
data is selected from the library of image processing 
utilities 64 (Fig. 3) (block 330). Finally, as shown in 45 
block 332, the selected utility is applied to the extract- 
ed sub-image zone containing the data, whereby the 
data is recognized for further processing according to 
the flowchart 230 (Fig. 7) or the flowchart 260 (Fig. 8). 
Examples of typical parameters stored in the docu- so 
ment definition file 82 are: handwritten, numeric, al- 
phanumeric, fixed/variable length, fixed/variable 
pitch, font, cursive script, and number of characters. 
Examples of typical image processing utilities con- 
tained in the library 84 are: OCR, constrained hand- 55 
print, unconstrained handprint, omnifont print and 
cursive script. Using such utilities, characters, for ex- 
ample, contained in the zone may be recognized. 
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According to a modification of the described em- 
bodiment, one or more zones on the identified docu- 
ment may be processed by image processing utilities 
which merely validate a zone, rather than read it, for 
example check that a zone on the document has been 
completed, e.g. the date zone 122 or the signature 
zone 128 on the cheque 100 (Fig. 4). 



Claims 

1 . A method of processing documents of a plurality 
of types, including the steps of: forming a digital 
image of a document (30) to be processed; and 
storing the digital image, characterized by the 
steps of: extracting from the stored digital image 
graphical features of said document (30); identi- 
fying the document type to which said document 
(30) belongs on the basis of the extracted graph- 
ical features; utilizing the identified document 
type to select an appropriate stored document 
description; utilizing the selected document de- 
scription to select at least one zone of the stored 
digital image and to select an image processing 
program; and processing the information con- 
tained in the selected zone or zones in the stored 
image using the selected image processing pro- 
gram. 

2. A method according to claim 1, characterized in 
that said identifying step is effected on the basis 
of extracted horizontal lines. 

3. A method according to claim 2, characterized in 
that said identifying step is effected on the basis 
of extracted horizontal lines, vertical lines and 
boxes. 

4. A method according to any one of the preceding 
claims, characterized in that the selected image 
processing program is a character recognition 
program adapted to recognize characters con- 
tained in the selected zone. 

5. A method according to any one of claims 1 to 3, 
characterized in that the selected image process- 
ing program is a zone validation program adapted 
to check that a zone on the document has been 
completed. 

6. A method according to any one of the preceding 
claims, characterized by the step of displaying 
the stored digital image on display means (18). 

7. Apparatus for processing documents of a plural- 
ity of types, including transport means (32) 
adapted to move a document (30); imaging 
means (38) adapted to provide digital image sig- 
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nals representing an image of said document 
(30); and image storage means (70) adapted to 
store said image signals, characterized by proc- 
essing means (74) coupled to said image storage 
means (70) and adapted to extract graphical fea- 5 
tures from the stored image of said document 
(30) and to compare the extracted graphical fea- 
tures with corresponding graphical features as- 
sociated with a plurality of document feature 
models arranged in a first file (80) stored in first 10 
memory means, to identify the document type to 
which the document (30) belongs; second mem- 
ory means for storing a second file (82) of docu- 
ment zone descriptions and adapted to provide 
location information for at least one zone on the 15 
document (30) containing data to be processed 
and parameters defining the properties of data 
contained within the respective zones; third 
memory means for storing a library of image 
processing programs (84) for performing image 20 
processing functions, and control means (86) 
adapted to utilize said parameters to select an 
appropriate image processing program from said 
library of image processing programs for proc- 
essing the data contained in said at least one 25 
zone. 

Apparatus according to claim 7, characterized in 
that the selected image processing program is a 
character recognition program. 30 

Apparatus according to claim 7 or claim 8, char- 
acterized by display means (18), responsive to 
the stored digital image of said document (30) to 
provide a visual display of said document (30). 35 
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